xen.git
16 years agox86: improve reporting through XENMEM_machine_memory_map
Keir Fraser [Tue, 3 Nov 2009 12:40:28 +0000 (12:40 +0000)]
x86: improve reporting through XENMEM_machine_memory_map

Since Dom0 derives machine address ranges usable for assigning PCI
device resources from the output of this sub-hypercall, Xen should
make
sure it properly reports all ranges not suitable for this (as either
reserved or unusable):
- RAM regions excluded via command line option
- memory regions used by Xen itself (LAPIC, IOAPICs)

While the latter should generally already be excluded by the BIOS
provided E820 table, this apparently isn't always the case at least
for IOAPICs, and with Linux having got changed to account for this it
seems to make sense to also do so in Xen.

Generally the HPET range should also be excluded here, but since it
isn't being reflected in Dom0's iomem_caps (and can't be, as it's a
sub-page range) I wasn't sure whether adding explicit code for doing
so would be reasonable.

Signed-off-by: Jan Beulich <jbeulich@novell.com>
16 years agox86: Clean up APIC local timer handling.
Keir Fraser [Tue, 3 Nov 2009 09:33:22 +0000 (09:33 +0000)]
x86: Clean up APIC local timer handling.

1. Writing TMICT=0 disables the timer. Use this fact to simplify and
improve reprogram_timer(). In particular, we always write TMICT, and
write zero when we do not need a timer interrupt.

2. In HPET broadcast timer handler, set TMICT=0 when we mask the APIC
local timer. May as well do this early, before entering deep sleep.

3. In HVM-guest APIC emulation, disable the emulated local timer when
the guest sets TMICT=0. Previously we would issue an immediate
one-shot interrupt.

Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
16 years agovmx: Disable vPMU feature by default
Keir Fraser [Tue, 3 Nov 2009 08:40:40 +0000 (08:40 +0000)]
vmx: Disable vPMU feature by default

Signed-off-by: Shan Haitao <haitao.shan@intel.com>
16 years agoLinux vbd hotplug: Speed up finding a loopback device
Keir Fraser [Tue, 3 Nov 2009 08:39:21 +0000 (08:39 +0000)]
Linux vbd hotplug: Speed up finding a loopback device

 - Use the device and inode information provided by losetup to find
   if the vbd backing file is in use on another vbd.

 - Use losetup to find a free loopback device.

Signed-off-by: Gary Grebus <gary.grebus@oracle.com>
16 years agoLinux vbd hotplug: Avoid "leaked" loopback devices
Keir Fraser [Tue, 3 Nov 2009 08:38:55 +0000 (08:38 +0000)]
Linux vbd hotplug: Avoid "leaked" loopback devices

Avoid races between hotplug "add" and "remove" leading to "leaked"
loopback devices.

- Don't setup loopback device if xend is no longer waiting for the
  vbd.
- Use the lock file to avoid add/remove races.

Signed-off-by: Gary Grebus <gary.grebus@oracle.com>
16 years agoxen-hvmctx: add recently added gtsc_khz field to output
Keir Fraser [Tue, 3 Nov 2009 08:37:52 +0000 (08:37 +0000)]
xen-hvmctx: add recently added gtsc_khz field to output

Signed-off-by: Dan Magenheimer <dan.magenheimer@oracle.com>
16 years agoFixes after addition of dummy_vcpu_info.
Keir Fraser [Mon, 2 Nov 2009 09:38:34 +0000 (09:38 +0000)]
Fixes after addition of dummy_vcpu_info.

 - Clean initialisation of new vcpu_info in map_vcpu_info() if the
 vcpu was previously using the shared dummy structure.
 - Don't allow a vcpu to run with teh shared dummy info structure, as
 no good can come of it.

Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
16 years agoExtend the max vcpu number for HVM guest.
Keir Fraser [Thu, 29 Oct 2009 14:48:28 +0000 (14:48 +0000)]
Extend the max vcpu number for HVM guest.
 - Originally the max vcpu number for HVM guest is 32, this patch
 extend the number to 128 on x86_64 hypervisor. (For i386 hypervisor,
 the max vcpu number  is still 32).
 - This patch extends the mp-table size to fit more vcpus.
 - HVM PV driver should call VCPUOP_register_vcpu_info hypercall to
 initialize the vcpu info if the vcpu number is more than 32.

Signed-off-by: Dongxiao Xu <dongxiao.xu@intel.com>
Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
16 years agoAMD IOMMU: remove a BUG_ON condition, to allow boot
Keir Fraser [Thu, 29 Oct 2009 14:05:46 +0000 (14:05 +0000)]
AMD IOMMU: remove a BUG_ON condition, to allow boot

Signed-off-by: Wei Wang <wei.wang2@amd.com>
16 years agostubdom: make stubdom-dm exit properly
Keir Fraser [Thu, 29 Oct 2009 14:04:45 +0000 (14:04 +0000)]
stubdom: make stubdom-dm exit properly

The built-in bash command wait should be able to take a pid argument
and just wait for the specified process to die, but it currently has a
bug and what actually does is waiting for the death of all the
children.  For this reason the stubdom-dm script doesn't exit properly
after stubdom destruction.  This patch solves the issue spawning only
one child, removing the sleep subprocess workaround that was used to
create a usable stdin for "xm console" and replacing it with a fifo.

Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
16 years agoExtend max vcpu number for HVM guest
Keir Fraser [Thu, 29 Oct 2009 14:03:56 +0000 (14:03 +0000)]
Extend max vcpu number for HVM guest

Reduce size of Xen-qemu shared ioreq structure to 32 bytes. This
has two advantages:
 1. We can support up to 128 VCPUs with a single shared page
 2. If/when we want to go beyond 128 VCPUs, a whole number of ioreq_t
    structures will pack into a single shared page, so a multi-page
    array will have no ioreq_t straddling a page boundary

Also, while modifying qemu, replace a 32-entry vcpu-indexed array
with a dynamically-allocated array.

Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
16 years agoUpdate .hgignore list
Keir Fraser [Thu, 29 Oct 2009 11:50:09 +0000 (11:50 +0000)]
Update .hgignore list

16 years agoPoint per-vcpu vcpu_info at a dummy structure by default, avoiding
Keir Fraser [Thu, 29 Oct 2009 11:14:54 +0000 (11:14 +0000)]
Point per-vcpu vcpu_info at a dummy structure by default, avoiding
need for scattered NULL-pointer checks.

Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
16 years agominios: xmalloc and realloc fixes
Keir Fraser [Thu, 29 Oct 2009 08:34:51 +0000 (08:34 +0000)]
minios: xmalloc and realloc fixes

 - xmalloc currently faults if xmalloc_new_page fails due to OOM
 - realloc treats xmalloc_hdr.size as the size of just the data region
   rather than the total size of data region + headers + padding.

From: James Pendergrass <James.Pendergrass@jhuapl.edu>
Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
16 years agoiommu: Do not initialise global vars explicitly to zero.
Keir Fraser [Wed, 28 Oct 2009 17:27:47 +0000 (17:27 +0000)]
iommu: Do not initialise global vars explicitly to zero.

Unnecessary and prevents them being allocated in BSS rather than data.

Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
16 years agovtd: Simplify acpi_dmar_init().
Keir Fraser [Wed, 28 Oct 2009 17:27:09 +0000 (17:27 +0000)]
vtd: Simplify acpi_dmar_init().

No need to check force_iommu, as that is done later in common code.

Also no need to clear iommu_enabled as again this gets checked
later. Furthermore doing it here, from a non-Intel-specific callsite,
breaks other vendors' IOMMU support.

Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
16 years agoAMD IOMMU: Use global interrupt remapping table by default
Keir Fraser [Wed, 28 Oct 2009 17:08:26 +0000 (17:08 +0000)]
AMD IOMMU: Use global interrupt remapping table by default

Using a global interrupt remapping table shared by all devices has
better compatibility with certain old BIOSes. Per-device interrupt
remapping table can still be enabled by using a new parameter
"amd-iommu-perdev-intremap".

Signed-off-by: Wei Wang <wei.wang2@amd.com>
16 years agoxend: disallow ! as a sxp separator
Keir Fraser [Wed, 28 Oct 2009 10:59:55 +0000 (10:59 +0000)]
xend: disallow ! as a sxp separator

Signed-off-by: Jim Fehlig <jfehlig@novell.com>
16 years agox86: vioapic: fix remote irr bit setting for level triggered interrupts
Keir Fraser [Wed, 28 Oct 2009 10:59:14 +0000 (10:59 +0000)]
x86: vioapic: fix remote irr bit setting for level triggered interrupts

Clear all entries' remote irr bits once the RTE entries' vector field
match with EOI message's vector.

Signed-off-by: Xiantao Zhang <xiantao.zhang@intel.com>
16 years agoscheduler: small csched_cpu_pick() adjustments
Keir Fraser [Wed, 28 Oct 2009 10:56:39 +0000 (10:56 +0000)]
scheduler: small csched_cpu_pick() adjustments

When csched_cpu_pick() decides to move a vCPU to a different pCPU, so
far in the vast majority of cases it selected the first core/thread of
the most idle socket/core. When there are many short executing
entities, this will generally lead to them not getting evenly
distributed (since primary cores/threads will be preferred), making
the need for subsequent migration more likely. Instead, candidate
cores/threads should get treated as symmetrically as possible, and
hence this changes the selection logic to cycle through all
candidates.

Further, since csched_cpu_pick() will never move a vCPU between
threads of the same core (and since the weights calculated for
individual threads of the same core are always identical), rather than
removing just the selected pCPU from the mask that still needs looking
at, all siblings of the chosen pCPU can be removed at once without
affecting the outcome.

Signed-off-by: Jan Beulich <jbeulich@novell.com>
16 years agox86: deny access to the ACPI PM timer I/O port range for Dom0
Keir Fraser [Wed, 28 Oct 2009 10:55:53 +0000 (10:55 +0000)]
x86: deny access to the ACPI PM timer I/O port range for Dom0

Also move the declaration of pmtmr_ioport to a suitable header file.

Signed-off-by: Jan Beulich <jbeulich@novell.com>
16 years agoBoot parameter definition adjustments
Keir Fraser [Wed, 28 Oct 2009 10:55:17 +0000 (10:55 +0000)]
Boot parameter definition adjustments

Consolidate the various attributes into macros, and tell the compiler
not to needlessly waste spec for aligning strings used at most once.

Signed-off-by: Jan Beulich <jbeulich@novell.com>
16 years agoMiscellaneous data placement adjustments
Keir Fraser [Wed, 28 Oct 2009 10:54:50 +0000 (10:54 +0000)]
Miscellaneous data placement adjustments

Make various data items const or __read_mostly where
possible/reasonable.

Signed-off-by: Jan Beulich <jbeulich@novell.com>
16 years agoirq cleanup
Keir Fraser [Wed, 28 Oct 2009 10:54:20 +0000 (10:54 +0000)]
irq cleanup

Make IRQ related data const or __read_mostly where possible/reasonable,
use platform_legacy_irq() where feasible, and remove the now unused
definition of vector_to_irq().

Signed-off-by: Jan Beulich <jbeulich@novell.com>
16 years agoxsm: Add support for Xen device policies
Keir Fraser [Tue, 27 Oct 2009 12:52:57 +0000 (12:52 +0000)]
xsm: Add support for Xen device policies

Add support for Xen ocontext records to enable device polices.  The
default policy will not be changed and instructions have been added to
enable the new functionality.  Examples on how to use the new policy
language have been added but commented out.  The newest version of
checkpolicy (>= 2.0.20) and libsepol (>= 2.0.39) is needed in order to
compile it.  Devices can be labeled and enforced using the following
new commands; pirqcon, iomemcon, ioportcon and pcidevicecon.

Signed-off-by : George Coker <gscoker@alpha.ncsc.mil>
Signed-off-by : Paul Nuzzi <pjnuzzi@tycho.ncsc.mil>

16 years agoxend: Add keymap to vfb config for hvm guests
Keir Fraser [Tue, 27 Oct 2009 12:52:14 +0000 (12:52 +0000)]
xend: Add keymap to vfb config for hvm guests

From: Jim Fehlig <jfehlig@novell.com>
Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
16 years agox86: IRQ Migration logic enhancement.
Keir Fraser [Mon, 26 Oct 2009 13:33:38 +0000 (13:33 +0000)]
x86: IRQ Migration logic enhancement.

To programme MSI's addr/vector safely, delay irq migration
operation before acking next interrupt. In this way, it should
avoid inconsistent interrupts generation due to non-atomic writing
addr and data registers about MSI.

Port the logic from Linux and tailor it for Xen.

Signed-off-by: Xiantao Zhang <xiantao.zhang@intel.com>
16 years agox86: Small simplification to get_page_from_l1e().
Keir Fraser [Mon, 26 Oct 2009 13:26:43 +0000 (13:26 +0000)]
x86: Small simplification to get_page_from_l1e().

No need for separate top-level check for page owner being NULL: this
can be folded into the case that page owner is not who the caller
expected (caller will never expect NULL owner).

Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
16 years agohvm: Clean up EPT/NPT 'nested page fault' handling.
Keir Fraser [Mon, 26 Oct 2009 13:19:33 +0000 (13:19 +0000)]
hvm: Clean up EPT/NPT 'nested page fault' handling.

Share most of the code.

Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
16 years agoxend, passthrough: Small fix to find_all_the_multi_functions()
Keir Fraser [Mon, 26 Oct 2009 12:20:07 +0000 (12:20 +0000)]
xend, passthrough: Small fix to find_all_the_multi_functions()

From: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
16 years agoshadow dirty-VRAM: avoid multiple remove_all_mappings calls.
Keir Fraser [Mon, 26 Oct 2009 12:18:50 +0000 (12:18 +0000)]
shadow dirty-VRAM: avoid multiple remove_all_mappings calls.

sh_remove_all_mappings() will walk roughly half of the shadow L1
tables for each MFN it's called with; calling it for every MFN in a
guest's framebuffer can be _very_ expensive, especially with the
shadow lock held across the whole operation.  Avoid that by just
blowing away all the shadows.

Signed-off-by: Tim Deegan <Tim.Deegan@citrix.com>
16 years agox86: Enable TSC_RELIABLE for AMD servers
Keir Fraser [Fri, 23 Oct 2009 09:15:17 +0000 (10:15 +0100)]
x86: Enable TSC_RELIABLE for AMD servers

Except for a published BIOS errata on family 11h processors,
all AMD servers that have the Invariant TSC bit set have
a reliable TSC so Xen should not write to the TSC.

Signed-off-by: Dan Magenheimer <dan.magenheimer@oracle.com>
Acked-by: Mark Langsdorf <mark.langsdorf@amd.com>
16 years agox86 ept: ignore guest writes to read only memory regions or memory
Keir Fraser [Fri, 23 Oct 2009 09:13:52 +0000 (10:13 +0100)]
x86 ept: ignore guest writes to read only memory regions or memory
holes in EPT.

This patch prevents domain crash when running memtest86 with EPT.

Signed-off-by: Xin Li <xin.li@intel.com>
16 years agovtd: interrupt remapping fix
Keir Fraser [Fri, 23 Oct 2009 09:13:22 +0000 (10:13 +0100)]
vtd: interrupt remapping fix

Fix the error of translation from int remapping table entry(IRTE) to
MSI msg. This error may write wrong IRTE back to the VTd hardware, and
block physical interrupts.

Signed-Off-By: Zhai Edwin <edwin.zhai@intel.com>
16 years agoxsm: Corrected check in io_has_perm()
Keir Fraser [Fri, 23 Oct 2009 09:12:52 +0000 (10:12 +0100)]
xsm: Corrected check in io_has_perm()

Fix the check in io_has_perm() to correctly check the start and end
of I/O Memory.

Signed-off-by : George Coker <gscoker@alpha.ncsc.mil>
Signed-off-by : Paul Nuzzi <pjnuzzi@tycho.ncsc.mil>

16 years agox86: Fix RevF detection in powernow.c
Keir Fraser [Fri, 23 Oct 2009 09:11:52 +0000 (10:11 +0100)]
x86: Fix RevF detection in powernow.c

The PowerNow! driver does not support RevF and earlier parts.
The current code checks for RevF processors in a function that
is not called.  Change the code path so that RevF processors
are detected and the driver fails registration.

Also fix cpufreq_add_cpu() to handle unsuccessful registration.

Signed-off-by: Mark Langsdorf <mark.langsdorf@amd.com>
16 years agoblktap2: Fix sysfs handling of blktap2
Keir Fraser [Fri, 23 Oct 2009 09:09:37 +0000 (10:09 +0100)]
blktap2: Fix sysfs handling of blktap2

The pause and unpause paths are currently broken due to a missing
slash. I took advantage of the opportunity to remove code repetition,
repeated strings that should point to the proper constants, etc

From: Andres Lagar Cavilla <andreslc@cs.toronto.edu>
Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
16 years agoxsm: Add getenforce and setenforce functionality to tools
Keir Fraser [Fri, 23 Oct 2009 09:05:15 +0000 (10:05 +0100)]
xsm: Add getenforce and setenforce functionality to tools

This patch exposes the getenforce and setenforce functionality for the
Flask XSM module.

Signed-off-by : Machon Gregory <mbgrego@tycho.ncsc.mil>
Signed-off-by : George S. Coker, II <gscoker@alpha.ncsc.mil>

16 years agopassthrough/stubdom: clean up hypercall privilege checking
Keir Fraser [Fri, 23 Oct 2009 09:04:03 +0000 (10:04 +0100)]
passthrough/stubdom: clean up hypercall privilege checking

This patch adds securty checks for pci passthrough related hypercalls
to enforce that the current domain owns the resources that it is about
to remap. It also adds a call to xc_assign_device to xend and removes
the PRIVILEGED_STUBDOMS flags.

Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
16 years agoblktap: Fix check_sharing() in blktapctrl
Keir Fraser [Fri, 23 Oct 2009 09:02:09 +0000 (10:02 +0100)]
blktap: Fix check_sharing() in blktapctrl

check_sharing() in blktapctrl does not work.
 - It accesses to xenstore by using wrong paths.
 - It compares image paths including image types.
 - It misjudges a return value of strcmp().

This patch fixes those mistakes.

Signed-off-by: Masaki Kanno <kanno.masaki@jp.fujitsu.com>
16 years agolibxc: fix a few memory leaks
Keir Fraser [Fri, 23 Oct 2009 09:00:22 +0000 (10:00 +0100)]
libxc: fix a few memory leaks

running qemu with valgrind I found I couple of small memory leaks in
libxc, this patch fixes them.

Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
16 years agominios: Optimize mmap(open("/dev/mem"))
Keir Fraser [Fri, 23 Oct 2009 08:59:45 +0000 (09:59 +0100)]
minios: Optimize mmap(open("/dev/mem"))

Set map_frames_ex's stride parameter to 0 and increment to 1 to avoid
building an explicit list of mfns.

Signed-Off-By: Samuel Thibault <samuel.thibault@ens-lyon.org>
16 years agostubdom: mmap on /dev/mem support
Keir Fraser [Wed, 21 Oct 2009 15:08:28 +0000 (16:08 +0100)]
stubdom: mmap on /dev/mem support

This patch adds support for mmap on /dev/mem in a stubdom; it is
secure because it only works for memory areas that have been
explicitly allowed by the toolstack (xc_domain_iomem_permission).
Incidentally this is all that is needed to make MSI-X passthrough work
with stubdoms.

Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
16 years agox86: Initialize the affinity field after assigning the vector.
Keir Fraser [Wed, 21 Oct 2009 15:07:37 +0000 (16:07 +0100)]
x86: Initialize the affinity field after assigning the vector.

To avoid strange output from debug-key "i", desc->affinity should
be the subset of the cfg->domain basically, so copy cfg->domain to
desc->affinity after assigning vector for the irq..

Signed-off-by: Xiantao Zhang <xiantao.zhang@intel.com>
16 years agoUpdate QEMU_TAG to a3285ff385d2568f0226f15fee2b9808ec3b6deb
Keir Fraser [Wed, 21 Oct 2009 15:06:30 +0000 (16:06 +0100)]
Update QEMU_TAG to a3285ff385d2568f0226f15fee2b9808ec3b6deb

16 years agoRemove unused XEN_DOMINF_cpu{mask,shift} definitions.
Keir Fraser [Wed, 21 Oct 2009 15:05:05 +0000 (16:05 +0100)]
Remove unused XEN_DOMINF_cpu{mask,shift} definitions.
Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
16 years agoxend: bootable flag of VBD not always of type int
Keir Fraser [Wed, 21 Oct 2009 08:23:10 +0000 (09:23 +0100)]
xend: bootable flag of VBD not always of type int

1. Calling VDB.set_bootable(True) results in string 'True' in managed
   config file. After xend restart, conversion int(bootable) in
   server/blkif.py fails.
2. selection of bootable disks in XendDomainInfo.py requires
   type(bootable) == int not str, otherwise all disks are taken as
   bootable.

This patch converts the bootable flag always to int.

Signed-off-by: Lutz Dube <Lutz.Dube@ts.fujitsu.com>
16 years agoxmalloc_tlsf: Fall back to xmalloc_whole_pages() if xmem_pool_alloc() fails.
Keir Fraser [Wed, 21 Oct 2009 08:21:01 +0000 (09:21 +0100)]
xmalloc_tlsf: Fall back to xmalloc_whole_pages() if xmem_pool_alloc() fails.

This was happening for xmalloc request sizes between 3921 and 3951
bytes. The reason being that xmem_pool_alloc() may add extra padding
to the requested size, making the total block size greater than a
page.

Rather than add yet more smarts about TLSF to _xmalloc(), we just
dumbly attempt any request smaller than a page via xmem_pool_alloc()
first, then fall back on xmalloc_whole_pages() if this fails.

Based on bug diagnosis and initial patch by John Byrne <john.l.byrne@hp.com>

Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
16 years agostubdom: implement pci coldplug
Keir Fraser [Wed, 21 Oct 2009 07:51:10 +0000 (08:51 +0100)]
stubdom: implement pci coldplug

This patch fixes the circular dependency problem in the toolstack that
prevented pci coldplug from working with stubdoms: after creating the
stubdom we wait for it to be properly initialized before going
further. We release the domain lock while we wait.

Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
16 years agox86: MSI: Mask/unmask msi irq during the window which programs msi.
Keir Fraser [Wed, 21 Oct 2009 07:50:23 +0000 (08:50 +0100)]
x86: MSI: Mask/unmask msi irq during the window which programs msi.

When program msi, it has to mask it first, otherwise, it
may generate inconsistent interrupts. According to spec,
if not masked, the interrupt generation behaviour is undefined.

Signed-off-by: Xiantao Zhang <xiantao.zhang@intel.com>
16 years agoObtain Linux kernel via git protocol by default (GIT_HTTP=y overrides)
Keir Fraser [Tue, 20 Oct 2009 13:36:01 +0000 (14:36 +0100)]
Obtain Linux kernel via git protocol by default (GIT_HTTP=y overrides)

Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
16 years agoFix nomigrate option implementation so that Xen builds.
Keir Fraser [Tue, 20 Oct 2009 09:23:28 +0000 (10:23 +0100)]
Fix nomigrate option implementation so that Xen builds.

Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
16 years agoAdd nomigrate config option to disable migration/restore
Keir Fraser [Tue, 20 Oct 2009 07:45:12 +0000 (08:45 +0100)]
Add nomigrate config option to disable migration/restore

The new nomigrate option can be set to non-zero in vm.cfg
(for both hvm and pvm) to disallow a guest from being
migrated or restored.  (Save is still allowed for the purpose
of checkpointing.)  The option persists into a save file
and is also communicated into the hypervisor, the latter
for the purposes of a to-be-added hypercall for communicating
to guests that migration is disallowed (which will be
used initially for userland TSC-related sensing, but may
find other uses).

Signed-off-by: Dan Magenheimer <dan.magenheimer@oracle.com>
16 years agoxend: Cast oos flag to int before arithmetic.
Keir Fraser [Tue, 20 Oct 2009 07:43:27 +0000 (08:43 +0100)]
xend: Cast oos flag to int before arithmetic.

Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
16 years agovtd: Disable VT-d if no DRHD units are probed.
Keir Fraser [Mon, 19 Oct 2009 15:50:14 +0000 (16:50 +0100)]
vtd: Disable VT-d if no DRHD units are probed.

Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
16 years agovtd: A few cleanups to avoid dereferencing NULL drhd pointers.
Keir Fraser [Mon, 19 Oct 2009 12:31:21 +0000 (13:31 +0100)]
vtd: A few cleanups to avoid dereferencing NULL drhd pointers.

In most cases I simply remove the reference since it is never actually
used.

Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
16 years agoRevert 20338:5f28661bb2bb
Keir Fraser [Mon, 19 Oct 2009 12:03:03 +0000 (13:03 +0100)]
Revert 20338:5f28661bb2bb

Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
16 years agoAllow guests to register secondary vcpu_time_info
Keir Fraser [Mon, 19 Oct 2009 10:58:36 +0000 (11:58 +0100)]
Allow guests to register secondary vcpu_time_info

Allow a guest to register a second location for the VCPU time info
structure for each vcpu.  This is intended to allow the guest kernel
to map this information into a usermode accessible page, so that
usermode can efficiently calculate system time from the TSC without
having to make a syscall.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
16 years agovt-d: do not enable VT-d on acpi=off
Keir Fraser [Mon, 19 Oct 2009 09:57:58 +0000 (10:57 +0100)]
vt-d: do not enable VT-d on acpi=off

This reverts changeset 20323: 2370e16ab6d3 and adds a small
check to iommu_setup() which should more correctly cover all cases.

Signed-off-by: Dexuan Cui <dexuan.cui@intel.com>
16 years agox86 shadow: Update cr3 in PAE mode when guest walk succeed but shadow walk fails
Keir Fraser [Mon, 19 Oct 2009 09:56:58 +0000 (10:56 +0100)]
x86 shadow: Update cr3 in PAE mode when guest walk succeed but shadow walk fails

When running in PAE mode, Windows 7 (apparently) will occasionally
switch cr3 with one of the L3 entries invalid, make it valid, and then
expect the hardware to load the new value.  (This behavior is
explicitly not promised in the hardware manuals.)  This leads to a
situation where on a shadow fault, the guest walk succeeds but the
shadow walk fails.  The code assumes this can only happen when the
domain is dying, and makes an ASSERT() to that effect.  So currently,
in debug mode, this will cause the host to crash; in non-debug mode,
this will cause a page-fault loop.

This patch solves the problem by calling update_cr3() in that path
when the guest is in PAE mode, and only ASSERT()ing when the guest is
not in PAE mode.  The guest will get one spurious page fault, but
subsequent accesses will succeed.

Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com>
16 years agoPer-domain switch to disable oos shadow page tables
Keir Fraser [Mon, 19 Oct 2009 09:55:46 +0000 (10:55 +0100)]
Per-domain switch to disable oos shadow page tables

Signed-off-by: Juergen Gross <juergen.gross@ts.fujitsu.com>
16 years ago[IOMMU] clean interrupt remapping and queued invalidation
Keir Fraser [Mon, 19 Oct 2009 09:54:35 +0000 (10:54 +0100)]
[IOMMU] clean interrupt remapping and queued invalidation

This patch enlarges interrupt remapping table to fix the out-of range
table access when using many multiple-function PCI devices.
Invalidation queue is also expanded.

Signed-Off-By: Zhai Edwin <edwin.zhai@intel.com>
Signed-Off-By: Cui Dexuan <dexuan.cui@intel.com>
16 years agox86: vMSI: Fix msi irq affinity issue for hvm guest.
Keir Fraser [Mon, 19 Oct 2009 09:50:46 +0000 (10:50 +0100)]
x86: vMSI: Fix msi irq affinity issue for hvm guest.

There is a race between guest setting new vector and doing EOI on old
vector.  Once guest sets new vector before its doing EOI on vector,
when guest does eoi, hypervisor may fail to find the related pirq, and
hypervisor may miss to EOI real vector and leads to system hang.  We
may need to add a timer for each pirq interrupt source to avoid host
hang, but this is another topic, and will be addressed later.

Signed-off-by: Xiantao Zhang <xiantao.zhang@intel.com>
16 years agogdbsx: malloc extra bye for null char
Keir Fraser [Mon, 19 Oct 2009 09:49:23 +0000 (10:49 +0100)]
gdbsx: malloc extra bye for null char

Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com>
16 years agoxm,xend: Add commands to hotplug usb devices to hvm guests
Keir Fraser [Mon, 19 Oct 2009 09:48:47 +0000 (10:48 +0100)]
xm,xend: Add commands to hotplug usb devices to hvm guests
Signed-off-by: James Song Wei <jsong@novell.com>
16 years agoxm: Fix xm network2-{attach,detach}
Keir Fraser [Mon, 19 Oct 2009 09:47:09 +0000 (10:47 +0100)]
xm: Fix xm network2-{attach,detach}

"xm help" is aborted due to a missing comma.
Some fixes in passing.
- less help message.
- typo.

Signed-off-by: Kouya Shimura <kouya@jp.fujitsu.com>
16 years agoxend: Implement VBD.media_change
Keir Fraser [Fri, 16 Oct 2009 08:04:53 +0000 (09:04 +0100)]
xend: Implement VBD.media_change

Signed-off-by: Masaki Kanno <kanno.masaki@jp.fujitsu.com>
16 years agoxm: Use 'vifname' config option to construct a qemu tap name.
Keir Fraser [Fri, 16 Oct 2009 07:36:22 +0000 (08:36 +0100)]
xm: Use 'vifname' config option to construct a qemu tap name.

Signed-off-by: Jim Fehlig <jfehlig@novell.com>
16 years agoxend: Check no VBDs attached on VDI.destroy
Keir Fraser [Fri, 16 Oct 2009 07:35:21 +0000 (08:35 +0100)]
xend: Check no VBDs attached on VDI.destroy

We can destroy a VDI by VDI.destroy even if the VDI is being used
to VBDs. This patch checks that the VDI is not used to VBDs.

Signed-off-by: Masaki Kanno <kanno.masaki@jp.fujitsu.com>
16 years agox86: document tsc_native configuration option in xmexample.hvm.
Keir Fraser [Fri, 16 Oct 2009 07:34:49 +0000 (08:34 +0100)]
x86: document tsc_native configuration option in xmexample.hvm.

Set the default value to 1

Signed-off-by: Xiantao Zhang <xiantao.zhang@intel.com>
16 years agoxm,xend: A few fixes for changeset 20314
Keir Fraser [Fri, 16 Oct 2009 07:32:34 +0000 (08:32 +0100)]
xm,xend: A few fixes for changeset 20314

Signed-off-by: Masaki Kanno <kanno.masaki@jp.fujitsu.com>
16 years agox86: Update powernow.c to latest cpufreq code
Keir Fraser [Fri, 16 Oct 2009 07:31:39 +0000 (08:31 +0100)]
x86: Update powernow.c to latest cpufreq code

The general cpufreq infrastructure has been improved over the
last year.  Update the AMD PowerNow! driver powernow.c to
take advantage of those improvements.

Specifically, addresses Novell bugzilla # 530035.

Signed-of-by: Mark Langsdorf <mark.langsdorf@amd.com>
16 years agoxend: passthrough: do not check non-page-aligned MMIO BAR if not strict-check
Keir Fraser [Fri, 16 Oct 2009 07:30:13 +0000 (08:30 +0100)]
xend: passthrough: do not check non-page-aligned MMIO BAR if not strict-check

When the option pci-passthrough-strict-check of
/etc/xen/xend-config.sxp is set to 'no', we don't check the
non-page-aligned MMIO BAR.  This could be useful in some cases, e.g.,
when there is only 1 device in the range of the page and we try to
assign the device to pv guest.

Signed-off-by: Dexuan Cui <dexuan.cui@intel.com>
16 years agovt-d: Fixpanic in msi_msg_read_remap_rte with acpi=off
Keir Fraser [Fri, 16 Oct 2009 07:28:47 +0000 (08:28 +0100)]
vt-d: Fixpanic in msi_msg_read_remap_rte with acpi=off

Xen panics when "acpi=off noacpi" is set. Problem is caused by
dereferencing NULL pointer in drhd after calling
acpi_find_matched_drhd_unit. As acpi_find_matched_drhd_unit can
return NULL, checks has to be done before returned value is used.

From: Miroslav Rezanina <mrezanin@redhat.com>
Signed-off-by: Keir Fraser <keir.fraser@eu.citrix.com>
16 years agox86: Fix ept and vt-d co-existence issue.
Keir Fraser [Fri, 16 Oct 2009 07:25:17 +0000 (08:25 +0100)]
x86: Fix ept and vt-d co-existence issue.

For vt-d's mmio address ranges, once ept enables, they should
be added to ept page tables with p2m lock held, and then guest can
access these ranges like conventional ram, but to change the ept
entries, it should take the p2m lock first.

Signed-off-by: Xiantao Zhang <xiantao.zhang@intel.com>
16 years agoxend: add a description config item for each guest.
Keir Fraser [Fri, 16 Oct 2009 07:24:47 +0000 (08:24 +0100)]
xend: add a description config item for each guest.

Add a new option "description=" to each VM to increase the
manageability of VM, which could be accessed via "xm list -l
MACHINE".e.g add "description='(name, james),(priority 5), (owner
james.song@company.com)'" to configure file, User can get the VM's
attribute easily by "xm list -l Machine" or some tools.

Signed-off-by: James Song<jsong@novell.com>
16 years agoRemove bogus call to get_domain_by_id() in do_domctl().
Keir Fraser [Thu, 15 Oct 2009 15:49:21 +0000 (16:49 +0100)]
Remove bogus call to get_domain_by_id() in do_domctl().
Signed-off-by: Keir Fraser <keir.fraser@eu.citrix.com>
16 years agogdbsx: a gdbserver stub for xen.
Keir Fraser [Thu, 15 Oct 2009 08:36:40 +0000 (09:36 +0100)]
gdbsx: a gdbserver stub for xen.

It should be run on dom0 on gdbsx enabled hypervisor. For details,
please see tools/debugger/gdbsx/README

Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com>
Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
16 years agoxm: Reuse VDI if location is same
Keir Fraser [Thu, 15 Oct 2009 07:31:08 +0000 (08:31 +0100)]
xm: Reuse VDI if location is same

In XenAPI mode, when we start a VM by using xm create command, VDIs
are always automatically created for VBDs.  Once we shut down the
VM, then start the VM again, then VDIs are newly created.  As a
result, a vdi.xml file continues to expand.

This patch reuses VDIs if location of the VDIs is same.

Signed-off-by: Masaki Kanno <kanno.masaki@jp.fujitsu.com>
16 years agox86: Remove unused temporay variable 'old_gvec'.
Keir Fraser [Thu, 15 Oct 2009 07:30:31 +0000 (08:30 +0100)]
x86: Remove unused temporay variable 'old_gvec'.

Signed-off-by: Xiantao Zhang <xiantao.zhang@intel.com>
16 years agolockprof: Fix x86_32 build and clean up coding style
Keir Fraser [Thu, 15 Oct 2009 07:29:33 +0000 (08:29 +0100)]
lockprof: Fix x86_32 build and clean up coding style

Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
16 years agoxend: destroy stubdoms synchronously
Keir Fraser [Thu, 15 Oct 2009 07:16:42 +0000 (08:16 +0100)]
xend: destroy stubdoms synchronously

This patch makes the destruction of stubdoms a synchronous event,
therefore it is no longer possible to run out of memory when rebooting
a guest because the stubdom of the old guest is always destroyed
before the creation of the new guest.

Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
16 years agoxm,xend: Add new option "change_home_server" to xm migrate
Keir Fraser [Wed, 14 Oct 2009 08:09:23 +0000 (09:09 +0100)]
xm,xend: Add new option "change_home_server" to xm migrate

This patch adds a new option to xm migrate command.  A concept of the
option is inspired from XenServer/XenCenter.  The concept is "Change
home server (affinity)."  The option name is "change_home_server."

Currently, a config.sxp file of a managed domain is not migrated to a
destination server even if the migration of the managed domain
succeeds.  The config.sxp file is kept in a source server.

By the patch, the config.sxp file is migrated with the managed domain.
The config.sxp file is unregistered from the source server, then the
config.sxp file is registered to the destination server.

BTW, should the config.sxp file be always migrated without the option?
If the managed domain is migrated without the option, the managed
domains exist on both the source server and the destination server.
(Of course, the managed domain on the source server is "halted" state,
and the managed domain on the destination server is "running" state.)
Is it good that the managed domains with a same UUID exist on both
servers?  (In the patch, I added the option for compatible.)

Signed-off-by: Masaki Kanno <kanno.masaki@jp.fujitsu.com>
16 years agoUpdate QEMU_TAG to 71324566f3b95bb88105659439adaef1d5bd155c
Keir Fraser [Wed, 14 Oct 2009 08:08:16 +0000 (09:08 +0100)]
Update QEMU_TAG to 71324566f3b95bb88105659439adaef1d5bd155c

16 years agoSpinlock profiling (enable in build with lock_profile=y)
Keir Fraser [Wed, 14 Oct 2009 08:07:51 +0000 (09:07 +0100)]
Spinlock profiling (enable in build with lock_profile=y)

Adds new tool xenlockprof to run from dom0.

From: Juergen Gross <juergen.gross@ts.fujitsu.com>
Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
16 years agominios: fix console end of line: \n\r -> \r\n
Keir Fraser [Wed, 14 Oct 2009 07:58:47 +0000 (08:58 +0100)]
minios: fix console end of line: \n\r -> \r\n

Change the end of line produced by minios' console from \n\r to \r\n.

Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
16 years agoxend: call pci_device_configure on the stubdom
Keir Fraser [Wed, 14 Oct 2009 07:56:55 +0000 (08:56 +0100)]
xend: call pci_device_configure on the stubdom

Whenever pci_device_configure is called on a guest that has a stubdom,
call pci_device_configure on the stubdom as well.

Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
16 years agoxend: allow a device to be assigned to a guest and its stubdom
Keir Fraser [Wed, 14 Oct 2009 07:56:25 +0000 (08:56 +0100)]
xend: allow a device to be assigned to a guest and its stubdom

This patch allows a pci device to be passed through an HVM guest and
its own stubdom at the same time.

Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
16 years agominios: fix minios console
Keir Fraser [Wed, 14 Oct 2009 07:55:43 +0000 (08:55 +0100)]
minios: fix minios console

MiniOS' console_print tries to expand '\n' into "\n\r" in place,
causing page faults if the string resides in text.
Use a duplicate of the string instead.

Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
16 years agoAdd build option to allow more hypercalls from stubdoms
Keir Fraser [Wed, 14 Oct 2009 07:54:58 +0000 (08:54 +0100)]
Add build option to allow more hypercalls from stubdoms

Stubdoms need to be able to make all the passthrough related
hypercalls on behalf of the guest (for now).

Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
16 years agostubdom pcilib: define endianness for minios
Keir Fraser [Wed, 14 Oct 2009 07:33:11 +0000 (08:33 +0100)]
stubdom pcilib: define endianness for minios

Include endian.h for MiniOS.

Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
16 years agominios pcifront: translate physical into virtual addresses
Keir Fraser [Wed, 14 Oct 2009 07:31:07 +0000 (08:31 +0100)]
minios pcifront: translate physical into virtual addresses

Qemu understands physical pci addresses while pciback expects virtual
pci addresses: this patch adds a translation function in pcifront to
make the conversion.

Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
16 years agopv-on-hvm: Adjust mkbuildtree to handle pv_ops header placement
Keir Fraser [Wed, 14 Oct 2009 07:29:56 +0000 (08:29 +0100)]
pv-on-hvm: Adjust mkbuildtree to handle pv_ops header placement

Due to the movement of the arch include directories, we need to adjust
where mkbuildtree looks for headers when building the pv drivers.
Also add a check for the location of features.c

Signed-off-by: Charles Arnold <carnold@novell.com>
16 years agox86: reduce the uses of CONFIG_COMPAT
Keir Fraser [Mon, 12 Oct 2009 11:56:00 +0000 (12:56 +0100)]
x86: reduce the uses of CONFIG_COMPAT

... to where it really is needed and meaningful (i.e. in some places
it seems to make more sense to use __x86_64__ instead).

Signed-off-by: Jan Beulich <jbeulich@novell.com>
16 years agox86: trust new architecturally-defined TSC Invariant bit on Intel systems
Keir Fraser [Fri, 9 Oct 2009 08:34:03 +0000 (09:34 +0100)]
x86: trust new architecturally-defined TSC Invariant bit on Intel systems

Trust new architecturally-defined TSC Invariant bit (on
Intel systems only for now, AMD TBD).

Signed-off-by: Dan Magenheimer <dan.magenheimer@oracle.com>
Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
16 years agox86: Fix build after ia64 grant-table build fix.
Keir Fraser [Fri, 9 Oct 2009 08:33:29 +0000 (09:33 +0100)]
x86: Fix build after ia64 grant-table build fix.

Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
16 years agoxend: Fix VDI-VBD link for XenAPI
Keir Fraser [Fri, 9 Oct 2009 07:56:43 +0000 (08:56 +0100)]
xend: Fix VDI-VBD link for XenAPI

I detected problems of VDI-VBD link again.
  - In the case of inactive managed domains, VDI->VBD link was lost
    by xend restarting (or host OS rebooting).
  - In the case of active domains, both VDI->VBD link and VBD->VDI
    link were lost by xend restarting.

When xend is restarted, information of VDI instances is restored from
a vdi.xml file.  But the vdi.xml file does not have UUID of VBD
because xend does not write the UUID to the vdi.xml file.  Therefore,
VDI->VBD link is lost.  When xend is restarted, information of VBD
instances is restored from xenstore.  But xenstore does not have UUID
of VDI.  Therefore, VBD->VDI link is lost.

This patch solves the problems.  VDI instances stop having UUID of
VBD.  Instead, xend gathers UUID of VBD each time it's required.  The
method is the same as Network->VIF link.  Information of VBD instances
is restored not only from xenstore but from a config.sxp file.  UUID
of VDI is restored from the config.sxp file.

FYI, VBD->VDI link of inactive managed domains is not lost because
information of VBD instances is restored from the config.sxp file.
UUID of VDI is written by xend to the config.sxp file.

Signed-off-by: Masaki Kanno <kanno.masaki@jp.fujitsu.com>
16 years agoblktap2: Check gcrypt library has MD5() function.
Keir Fraser [Fri, 9 Oct 2009 07:55:43 +0000 (08:55 +0100)]
blktap2: Check gcrypt library has MD5() function.

From: Dulloor <dulloor@gmail.com>
Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
16 years agoFix the IA64 build of the hypervisor.
Keir Fraser [Fri, 9 Oct 2009 07:54:25 +0000 (08:54 +0100)]
Fix the IA64 build of the hypervisor.

This is completely untested, beyond confirming that it compiles.

Signed-off-by: Steven Smith <steven.smith@citrix.com>
16 years agoxend: Fix bug in superpage flag handling
Keir Fraser [Fri, 9 Oct 2009 07:53:42 +0000 (08:53 +0100)]
xend: Fix bug in superpage flag handling

During testing I discovered that using a bootloader magically clears
the superpage flag out of the config.  This small patch fixes that
behavior.

From: Dave McCracken <dcm@mccr.org>
Signed-off-by: Keir Fraser <keir.fraser@citrix.com>